4.2 - Design - Expanding the State and Action Space
To graduate our agent from a single-purpose actor to a decision-maker, we must evolve its environment. This involves expanding its "senses" (the observation space) and its "abilities" (the action space) to provide the necessary tools for learning the more complex task of balancing economic priorities.
This document details the design changes from the WorkerEnv
to the new MacroEnv
.
Agent Requirements Analysis
The new task requires the agent to:
- Sense when its supply capacity is becoming a limiting factor.
- Possess the ability to increase that supply capacity.
This mandates a direct evolution of our environment's API.
1. Observation Space Evolution
The agent needs a more complete picture of its supply situation than just supply_left
. We will provide the raw components of supply to allow the agent to learn the relationship itself.
Design Comparison:
-
WorkerEnv
Observation Space -Box(3,)
Index Feature 0
Minerals 1
Worker Count 2
Supply Left -
MacroEnv
Observation Space -Box(4,)
Index Feature Rationale for Change 0
Minerals (Unchanged) Required for affordability checks. 1
Worker Count (Unchanged) Required as a progress metric. 2
supply_used
(New) Provides the "demand" side of the supply equation. 3
supply_cap
(New) Provides the "supply" side of the equation.
Rationale: Providing supply_used
and supply_cap
separately is a more robust design. It gives the agent the raw data and allows the neural network to learn the concept of "supply pressure" on its own, which can lead to a more nuanced and effective policy.
2. Action Space Evolution
To meet the new requirements, the agent must be given the ability to build a supply structure.
Design Comparison:
-
WorkerEnv
Action Space -Discrete(2)
Action Meaning 0
Do Nothing 1
Build Worker -
MacroEnv
Action Space -Discrete(3)
Action Meaning Rationale for Change 0
Do Nothing (Unchanged) A passive choice is always required. 1
Build Worker (Unchanged) The primary economic action. 2
Build Supply (New) Directly provides the agent with the tool needed to solve the new problem.
These modifications transform the environment from a simple, single-goal task to a more dynamic system where the agent must learn to prioritize actions based on a richer set of inputs.